Data Collection

Most Firefox features operate entirely on-device. A feature involving one or more connections to a Mozilla server qualifies as data collection. Even if the request data is not actually retained by Mozilla, it should be reviewed as if it might be, so that the privacy properties of Firefox can be verified without relying on retention commitments.

As part of our overall vision for privacy, we hold Firefox to an unusually high standard with respect to handling user data. Accordingly, any data collection needs to be carefully vetted for consistency with this standard. Our engineering processes are designed to ensure this vetting happens consistently, but browsers are extremely complex and mistakes can happen. As such, everyone who works on Firefox is responsible for understanding our rules for data and speaking up if something doesn’t look right.

This document outlines our approach and policies on a few key topics.

User Control

Users must be able to disable any network connection from the client to Mozilla. Absent a good reason, this should be possible as a supported configuration in the browser UI. In the rare situations where we do have a good reason not to offer a control in Firefox settings (e.g., fetching the malicious add-ons blocklist), there must still be a documented mechanism to disable the connection in about:config.

Browsing Data

A longstanding tenet of Firefox development is that even Mozilla shouldn’t be able to learn what a user does online — sites they visit, what they do on them, etc. This is different from many other browsers and internet applications, where the vendor routinely collects and stores sensitive user data on their servers. Rather than asking users to trust Mozilla with this information, Firefox aims to provide verifiable guarantees of secrecy: someone should be able to inspect the source code and verify that it is never revealed in the first place. There are various edge-case exceptions to this posture, but that’s the big picture.

The simplest guarantee is inspectable source code that never transmits the data[1]. This is how Firefox handles browsing data modulo a very small number of exceptions. Those exceptions are situations where we use some form of encryption to create a verifiable guarantee for an important online use-case. For example, the history and bookmark sync feature for Firefox Accounts uses end-to-end encryption to store browsing history on Mozilla’s servers without Mozilla learning the contents. The approved technologies for verifiable guarantees are outlined below.

The consequence of these restrictions on sensitive data is that nearly all of the data transmitted by Firefox to Mozilla falls into the not-particularly-sensitive bucket. This includes the data exchanged to power various cloud-supported features (updates, add-ons, push notifications, etc) as well as measurement telemetry (described in the next section).

Telemetry and Experiments

Firefox contains various measurement probes to help us understand and improve the browser, loosely known within Mozilla as “Telemetry”[2]. This instrumentation is enabled by default, but can be disabled during onboarding, in settings, or through various other mechanisms (e.g., enterprise policies). Some representative probes include OS version, memory usage, CSS use-counters, and number of interactions with the bookmarks bar. In addition to telemetry, other measurement probes collect data on a more de-identified basis for measuring daily usage numbers and for some engagement and attribution purposes.

Sitting atop this infrastructure is an optional experimentation system. This allows us to deploy features to subsets of our user base to ensure they perform as expected. For example, we might deploy a new network protocol backend to 1% of our users to ensure it doesn’t increase average connection times or failure rates.

Building a full-stack, web-compatible browser is extremely complicated, and there is no realistic way to do it without representative telemetry and experimentation. For example, page-load speed depends on many factors like network conditions and hardware quirks which cannot be exhaustively tested in automation. Telemetry allows Mozilla to determine how Firefox is performing for users, and measure whether big changes make things faster or slower before deploying them to everyone. The browsers that brag about not having telemetry all use someone else’s engine (generally Chromium), and thus rely on the engine vendor to collect telemetry and tune the stack correctly. We strive to keep Firefox independent and competitive, so we need infrastructure to tell us what is and is not working well.

Ordinary telemetry is associated with a pseudonymous identifier called a client ID. Our data infrastructure endeavors to make it difficult to associate a client ID with identifiable data, but this is not a strong guarantee. Therefore, ordinary telemetry is generally restricted to low-sensitivity technical and interaction data. Note that “interaction” here refers to interaction with Firefox UI, not web content. The latter would inherently reveal browsing data, and is thus off-limits.

Verifiable Guarantees

As discussed above, sensitive information like browsing data must be protected by a verifiable guarantee of secrecy (modulo the exceptions listed below). This section outlines the current mechanisms Firefox uses to provide such a guarantee in different situations:

On-Device Processing: This is the default, and should be used wherever possible.
End-to-End Encryption: This is used for situations where Mozilla needs to store user data as an opaque payload. The bookmark, history, and password sync feature is the canonical use-case for this feature. To be clear, the ‘ends’ of this type of End-to-End encryption are a users’ devices, and exclude Mozilla.
Oblivious HTTP: OHTTP is an IETF standard for concealing the IP address in HTTPS transactions which can be used to create a verifiable guarantee that a network service cannot link a request to a client. It does this by routing the request through an independently-operated relay (in our case, Fastly). The protocol ensures that the relay provider sees the source of the request but not the contents, and the endpoint sees the contents but not the source (more explanation here). For this to work, the payload must be carefully vetted to ensure that its contents are non-identifying. There are obvious ways to get this wrong (e.g., including any sort of personal identifier), but subtler ones as well (e.g., a set of innocuous values that could be jointly unique to a user). For this reason, any usage of OHTTP requires careful analysis from a privacy expert as part of data review.
DAP/Prio: DAP is a standards-track Multi-Party Computation (MPC) aggregate measurement protocol with formally verifiable privacy guarantees. It allows computing aggregate statistics across a population (e.g., how many users visit this page with a known web-compat issue) without the individual data points being revealed to any party off the device. There are a lot of complicated details, but an important upshot is that the protocol incorporates differential privacy guards to make it virtually impossible to inadvertently leak individual information with too small of a sample (it does this by automatically adding noise whose magnitude is inversely proportional to the sample size). Firefox’s DAP node is operated by ISRG, who also operates Let’s Encrypt.

Exceptions

There are a few exceptional cases where information related to a website visited by the user is sent to Mozilla without a verifiable guarantee. These are generally unsurprising and self-explanatory, but it’s worth writing them down. If you discover one that isn’t listed here, please flag it to the Firefox Technical Leadership Committee so that it can be either addressed or added to this list:

Specific opt-in consent: For example, submitting a crash report with a memory dump (which, depending on the crash location and the compiler memory layout, could include data like URLs).
Explicit user action: For example, submitting a report to us that a site is broken.
Site-specific feature integrations for widely-used sites: For example, we learn users visit Google to search, and we learned users visited Facebook when they received the contextual prompt to install the Facebook container.
Visiting a Mozilla-operated website: Mozilla, like any website operator, has the technical capability to observe which websites are loaded by a given IP address. Some sites, like addons.mozilla.org, also have special hooks to deliver browser functionality.
The New-Tab Content Feed: Firefox provides an optional feed of news articles and other content on the Home and New Tab pages. This was originally designed to operate somewhat like a website, so the server is notified when a story is clicked. We are investigating routing these notifications through OHTTP in order to remove this exception.

Data Review

Any data collection introduced to Firefox requires careful review. Our code review system automatically detects the most common patterns (e.g., new or modified glean probes) and flags any matching changesets for classification. However, these heuristics may not catch unusual patterns, and so code reviewers are responsible for manually flagging anything that slips through the cracks.

The details of the data review process for Firefox patches are documented here.